Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for cargo nextest && Fixed support for deno #3920

Open
wants to merge 224 commits into
base: main
Choose a base branch
from

Conversation

spigaz
Copy link

@spigaz spigaz commented Apr 10, 2024

This is still work in progress:
https://nexte.st/book/custom-test-harnesses.html

Progress

  • Added support for -h and --help w/tests
  • Added support for -V and --version w/tests
  • Added support for --list --format terse w/tests
  • Added support for --list --format --ignored w/tests
  • Add support for <test-name> --nocapture --exact w/tests
  • Reinstated support for --include-ignored w/tests
  • Reinstated support for --skip name w/tests
  • Reinstated support for --skip=name w/tests
  • Reinstated support for filter w/tests
  • Solved the temporary directory conflict caused by the concurrent invocation
  • Update --include-ignored to use clap
  • Update --skip name to use clap
  • Added tests to execute tests on nodejs
  • Fixed support and added tests to execute tests on deno
  • Add tests to execute tests on chrome
  • Added tests to execute tests on firefox
  • Add tests to execute tests on safari
  • Add tests to execute tests on edge
  • Add tests with cargo nextest invoking wasm-bindgen-test-runner
  • Use feature! macro to simplify testing
  • Use feature! test_mode parameter to execute the specification over all the test_modes

Updated to use clap

  • docopt was used already in wasm-bindgen, but its unmaintained https://github.com/docopt/docopt.rs and recommends clap or structopt, but structopt's docs state "As clap v3 is now out, and the structopt features are integrated into (almost as-is), structopt is now in maintenance mode"

Updated the macro wasm_bindgen_test

  • To allow listing handling as required by nextext

Testing

  • I wasn't sure what was the best place to put the tests in, because of the custom test runner in the cli crate, so I placed them on main tests folder.

  • I used a variation of BDD that I use for many years now.
    -- Although it seems a bit more verbose, it makes writing tests a lot simpler and faster, anyone can understand them and add new ones.
    -- When something breaks its very easy to understand why.
    -- Although conter-intuitive, in practice I have found that they are a lot easier to update on refactors.

Overhead

  • When used from cargo nextest the overhead is pretty real, as it has to load the runtime and the wasm into it, so it can be a lot, but it runs them in parallel, so it makes up for it on tests that take a long to execute.

Architecture:

  • I would prefer to organize the code into:
    -- a folder with the CLI and environment stuff
    -- a folder with the wasm handling
    -- a folder with the runtimes

@daxpedda
Copy link
Collaborator

Apologies for the delay, I was on vacation, still catching up.
Planning to take a look tomorrow!

@spigaz
Copy link
Author

spigaz commented Apr 18, 2024

That's okay!
I still have one missing piece ahead to get it working with cargo nextest, so its fine.

@spigaz
Copy link
Author

spigaz commented Apr 22, 2024

Well its working, actually its my second version working, the fist one used locks.

But according to the the library documentation it shouldn't work (at least cross-platform):

See the tests in lib.rs for cross-platform lock behavior that may be relied upon
// Concurrent shared access is OK, but not shared and exclusive.
// Once all shared file locks are dropped, an exclusive lock may be created;
// No other access is possible once an exclusive lock is created.
// Once the exclusive lock is dropped, the second file is able to create a lock.
https://github.com/danburkert/fs2-rs/blob/master/src/lib.rs

Anyway, the overhead was as bad as I expected, as each test execution now requires the same overhead as a complete assembly, I only tested on firefox so far, about 12-13s each, thermal throttled.

But when tests take longer than the overhead, the extra cores start to payoff.

I'm having thermal throttling issues, but my speed boost might be around 3x when I have the tests configured for very heavy parameters, its a property based testing variation.

@spigaz spigaz changed the title WIP: Add Support for cargo nextest Add Support for cargo nextest Apr 23, 2024
@spigaz
Copy link
Author

spigaz commented Apr 27, 2024

I was waiting for your review, because as I do more tests, I end up finding more stuff to fix, making the review harder on you. Sorry.

I ended up fixing support for deno, not sure why, but it was for sure broken.
I added some tests and updated the github action, so now its under control.
You can cherry pick that if you want.

@spigaz spigaz changed the title Add Support for cargo nextest Add Support for cargo nextest && Fixed support for deno Apr 27, 2024
@spigaz
Copy link
Author

spigaz commented May 24, 2024

@daxpedda I have been trying to create a macro to simplify the tests, this will be particular useful for tests that should be run over the different runtimes supported.

I'm still working on the syntax, as the macro by itself, is allowing different usage patterns.

But right now, this is the format already working:

feature! {
    given_there_is_an_assembly_with_one_failing_test();
    when_wasm_bindgen_test_runner_is_invoked_with_the_option("-V");

    "Outputs the version" {
        then_the_standard_output_should_have(
            &format!("wasm-bindgen-test-runner {}", env!("CARGO_PKG_VERSION")),
        );
    }

    "Returns success" {
        then_success_should_have_been_returned();
    }
}

It expands to two tests

#[test]
fn outputs_the_wasm_bindgen_test_runner_version_information_feature() {
    let mut context = Context::new();
    given_there_is_an_assembly_with_one_failing_test(&mut context);
    when_wasm_bindgen_test_runner_is_invoked_with_the_option(&mut context, "-V");
    then_the_standard_output_should_have(
        &context,
        &format!("wasm-bindgen-test-runner {}", env!("CARGO_PKG_VERSION")),
    );
}

#[test]
fn returns_success_feature() {
    let mut context = Context::new();
    given_there_is_an_assembly_without_anything(&mut context);
    when_wasm_bindgen_test_runner_is_invoked_with_the_option(&mut context, "-V");
    then_success_should_have_been_returned(&context);
}

If the target platform is wasm it uses [wasm_bindgen_test::wasm_bindgen_test] instead.

This allows for a more compact file, because there are sometimes 5-7 different outcomes for a single execution context

The idea is by default to respect the single outcome, allowing easy troubleshooting of regressions, but its possible to aggregate the executions on CI for faster execution times.

spigaz added 21 commits May 24, 2024 19:18
…warning to invocation with_an_empty_assembly.
…gen-cli because of dependency of wasm-bindgen-test-runner. Updated assembly builder to avoid runner runtime conflicts.
…ation with_an_assembly with_one_successful_test.
…ructure to allow multiple instances of the runner using the same assembly to make the tests faster.
…t generate random suffix for assembly directory name.
…Execution to invocation with_an_assembly with_one_successful_test.
…ation with_an_assembly with_one_failing_test.
…to invocation with_an_assembly with_one_failing_test.
spigaz added 17 commits May 24, 2024 19:18
…est 1 standard output to the invocation with_an_assembly with_arguments TESTNAME --nocapture --exact with_successful_test_match_1_of_2.
…est 1 standard output to the invocation with_an_assembly with_arguments TESTNAME --nocapture --exact with_successful_test_match_2_of_2.
Copy link
Collaborator

@daxpedda daxpedda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use docopt

  • It was used already in wasm-bindgen
  • Not sure if you want to use something more modern, its easy to change
  • Just checked its unmaintained https://github.com/docopt/docopt.rs and recommends clap or structopt, but structopt's docs state "As clap v3 is now out, and the structopt features are integrated into (almost as-is), structopt is now in maintenance mode"

I think updating to Clap would be nice, seeing that Cargo uses it as well.
That said, lets do this in a follow-up PR.

Updated the macro wasm_bindgen_test

  • To export and additional method to allow listing handling as required by nextext
  • Not sure if it might make sense to override current export

I think it would be better to update the current macro if possible.

Testing

  • I wasn't sure what was the best place to put the tests in, because of the custom test runner in the cli crate, so I placed them on main tests folder.
  • I used a variation of BDD that I use for many years now.
    -- Although it seems a bit more verbose, it makes writing tests a lot simpler and faster, anyone can understand them and add new ones.
    -- When something breaks its very easy to understand why.
    -- Although conter-intuitive, in practice I have found that they are a lot easier to update on refactors.

I'm unsure what exactly is going on here and I didn't review the tests yet, but from a glance I would like to change the following:

  • Shouldn't the tests be in /crates/cli/tests instead of in /tests?
  • Not sure why some folders are prefixed with _.
  • Tests include extremely repetitive code and use ambiguous terms like "assembly". While I don't mind having all these files, I would prefer if we can cut down on the repeated code entirely.
  • Why did you choose Firefox to be its own target? We should use the name "browser" instead.

Overhead

  • When used from cargo nextest the overhead is pretty real, as it has to load the runtime and the wasm into it, so it can be a lot, but it runs them in parallel, so it makes up for it on tests that take a long to execute.

That was to be expected, in follow-up steps we should discuss how to cut that down. The first idea that comes to mind is to introduce a way to start and end the headless browser separately from all the tests. That way each test doesn't spawn its own headless browser, which should cut down on overhead significantly.

Additionally some tests could be run in parallel without requiring their own context, e.g. tests in workers. But that would require some significant refactoring.

Architecture:

  • I would prefer to organize the code into:
    -- a folder with the CLI and environment stuff
    -- a folder with the wasm handling
    -- a folder with the runtimes

Considering that the changes aren't actually that big, I would prefer if we do any further refactoring in a separate PR.

@daxpedda I have been trying to create a macro to simplify the tests, this will be particular useful for tests that should be run over the different runtimes supported.

I think that's a great idea, but lets not use the name feature, because its unclear exactly what that means. Simply test or multi_target_test (the term target is clear in this context) would be enough I believe.


Again, I apologize for the delay. I spontaneously extended my vacation and after coming back my backlog was so huge I had to prioritize other things until I reached this PR.

Comment on lines +65 to +66
[dev-dependencies]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[dev-dependencies]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we move any changes to Deno to a separate PR to make it easier for me to review.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem with that, but you will have to guide me on the best way to do that.

It wasn't my intention, I though it was working, so I added a test for it, and then it needed two fixes without them it wasn't working for sure, as you can test.

Copy link
Collaborator

@daxpedda daxpedda Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deno is different then the browser issue, the main reason why I don't want it in this PR is because I can't review it.
So I'm fine with proceeding here simply without testing Deno.

EDIT: Maybe I'm missing something, but my assumption here is that without any changes to Deno everything will still work, its just not tested on Deno.

Comment on lines +214 to +216
None if std::env::var("WASM_BINDGEN_USE_BROWSER").is_ok() => TestMode::Browser {
no_modules: std::env::var("WASM_BINDGEN_USE_NO_MODULE").is_ok(),
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?
I think we could at least discuss and do this in a separate PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I added it to be able to execute the tests on the browser without having require the runtime in the code, that way you can check if your assembly runs in all runtimes without having to recompile it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great feature, but still belongs in a separate PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same issue with the deno fixes, I didn't planned for them, both had the same underlying objectives:

  • add tests to make sure the application remained working as close as possible to the original, because of the docopt and nextest implied changes
  • enforce that the nextest api is implemented consistently across all the supported runtimes

That being said, I actually do agree with you, but please advise me the best way to do this.
If I open PRs for them, once they are approved and merged, should a rebase to the latest version work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see ... I guess the issue is that running the tests on all targets would be more complicated without this change.
I'm fine with leaving it in this PR in this case, but it needs some added documentation.

@@ -23,6 +23,7 @@ bin-dir = "wasm-bindgen-{ version }-{ target }/{ bin }{ binary-ext }"
docopt = "1.0"
env_logger = "0.8"
anyhow = "1.0"
fs2 = "0.4.3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was an accidental leftover?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, I'm going to review that.

Comment on lines +84 to +85
Some(Some(_)) => "$",
Some(None) => "$",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Some(Some(_)) => "$",
Some(None) => "$",
Some(_) => "$",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was a left over, that I didn't clean up, sorry.

@spigaz
Copy link
Author

spigaz commented Jun 13, 2024

@daxpedda That's okay, the only issue is that the details aren't as fresh as they were.

I already improved the tests a lot, and I hope I'm able to improve them a bit more still, my objective is to make them more clear and intuitive and easier to change.

I haven't pushed those changed yet, because there are some things remaining in the macro.

I have no issues with the naming, I just use a naming from BDD, but different perspectives always enrich the solution.

I'm going to finish those, once I push that, I'm going to tackle your requests one by one.

@daxpedda daxpedda added the waiting for author Waiting for author to respond label Jun 23, 2024
@spigaz
Copy link
Author

spigaz commented Jun 26, 2024

Just a quick update, I was able to move forward with the multi_target_test model, I'm still flushing out details in the library as I convert the existing tests I created.

But its pretty much working, right now it already runs on all the runtimes that we specify, the problem is that its flushing issues and inconsistencies specific as expected.

One of the problems was that the browser was adding extra tabs to the output (I used cargo test with a #[test] as reference), its fixed now.

The other issue I'm having is some instability in the Safari Driver, unfixed for now.

But aside from that, the model seems to work well to ensure there is a consistent behaviour across all runtimes, but I should know more as I move forward.

I'll try to finish this part ASAP, but the blockers I believe will be only the issues that it will uncover.

@daxpedda
Copy link
Collaborator

The other issue I'm having is some instability in the Safari Driver, unfixed for now.

FWIW, I've been having spurious failures using Safaridriver lately as well, so far the only error I encountered is a port binding issue. My guess here is that the new ARM MacOS runners have some issue with binding random ports in quick succession or an issue around that with the driver itself.

@spigaz
Copy link
Author

spigaz commented Jun 29, 2024

The other issue I'm having is some instability in the Safari Driver, unfixed for now.

FWIW, I've been having spurious failures using Safaridriver lately as well, so far the only error I encountered is a port binding issue. My guess here is that the new ARM MacOS runners have some issue with binding random ports in quick succession or an issue around that with the driver itself.

I was able to trace down the issue I was having, it was caused by multiple parallel safari invocations, once I added a lock to only allow one at a time, the problem was gone.

That lock doesn't limit test execution in practice, as Safari by design only allows one session at a time
https://developer.apple.com/documentation/webkit/about_webdriver_for_safari#2957226

The issue seems to be triggered by each safaridriver triggering the execution of a Safari --automation, and for some reason when that happens, even with workarounds, they have to be manually terminated for the execution to continue.

This solution seems to be cleaner, but the Safari --automation remains in memory, as it did before.

I removed most experiments, but left one, I moved the BackgroundChild into inside the Client, I did it because I was having some issues with the drop order, but left it in, because the code was cleaner, not sure if those issues remain.

I'm going to add some more safari tests to stress it more, then I'll add a parallel PR like I did with deno.

@spigaz
Copy link
Author

spigaz commented Jul 1, 2024

The other issue I'm having is some instability in the Safari Driver, unfixed for now.

FWIW, I've been having spurious failures using Safaridriver lately as well, so far the only error I encountered is a port binding issue. My guess here is that the new ARM MacOS runners have some issue with binding random ports in quick succession or an issue around that with the driver itself.

I'm going to add some more safari tests to stress it more, then I'll add a parallel PR like I did with deno.

I ended up still achieving some more instability, that led me to beef up the lock, it seems pretty robust now.

But sometimes Safari --automation still refuses to allow the creation of a new session, so in those situations, I updated it to kill it, and ask safaridriver again for a new session, forcing it to initiate it again, that seems to fix the remaining issues.

I tested terminating the Safari --automation for every wasm-bindgen-test-runner execution, but that imposes a severe delay in the tests execution overall.

Right now, as I converted more tests into the multi_target_test mode, I'm dealing with some inconsistencies of output between the targets, but the Safari instability seems at least a lot better.

@daxpedda
Copy link
Collaborator

daxpedda commented Jul 8, 2024

@spigaz I believe you are spending quite a significant amount of time on testing facilities that don't have to be this perfect compared to all the other untested features wasm-bindgen contains (which is unfortunate of course). While this is really appreciated, I would prefer to split efforts into multiple PRs instead of trying to do everything at once.

@spigaz
Copy link
Author

spigaz commented Jul 8, 2024

@daxpedda That wasn't my objective, but nextest implied changes in many different places, including in the arguments handling.

As a rule I try to add tests before introducing changes, to make sure I don't make it worse.

Because of that and the nextest tests I ended up with a lot of tests, that when I generalised found even more issues that I tried to fix.

Right now I'm just trying to iron out some issues with safari to avoid breaking the CI, it seems okay already, perhaps some starvation in the safari instance access.

But my point is to make it like we did with the deno PR, using the test base as reference.

@daxpedda
Copy link
Collaborator

daxpedda commented Jul 8, 2024

The testing you are trying to implement here is greatly appreciated, but it is of way higher quality then this repository is used to. In wasm-bindgen there are a lot of features that aren't tested for every browser and every target. E.g. trying to fix running Safari tests or fixing Deno is out of scope for this PR.

What I'm trying to say is that bringing the PR to a mergable state could be way simpler if the tests are in a comparable scope to the rest of the repository. Extending and improving tests can be done in a separate PR.

Ofc I will leave the decision on how to proceed here to you, I'm certainly not against getting this well tested in the same PR!

@spigaz
Copy link
Author

spigaz commented Jul 13, 2024

@daxpedda I understand your reasoning, and to be honest, I'm trying to narrow things to make it easier for you to review.

The problem is that cargo nextest stresses wasm-bindgen-test a lot, in my repo, 771 times, that means that all the instability issues pop up.

Anyway for now, I'm not being able to trigger issues with any of the supported runtimes...

I was finally able to remove the hacks I added to get cargo nextest working, because of the shared directory, using a ResourceCoordinator.

I just have some minor things, then I'm going to let you take the lead on this and I can create as much PRs as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for author Waiting for author to respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants